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Abstract 

In this paper, we study the problem of processing continuous range queries in a hierarchical 
wireless sensor network. Recently, as the size of sensor networks increases due to the growth of 
ubiquitous computing environments and wireless networks, building wireless sensor networks in a 
hierarchical configuration is put forth as a practical approach. Contrasted with the traditional 
approach of building networks in a "flat" structure using sensor devices of the same capability, the 
hierarchical approach deploys devices of higher capability in a higher tier, i.e., a tier closer to the 
server. While query processing in flat sensor networks has been widely studied, the study on query 
processing in hierarchical sensor networks has been inadequate. In wireless sensor networks, the main 
costs that should be considered are the energy for sending data and the storage for storing queries. 
There is a trade-ofi' between these two costs. Based on this, we first propose a progressive processing 
method that effectively processes a large number of continuous range queries in hierarchical sensor 
networks. The proposed method uses the query merging technique proposed by Xiang et al. as 
the basis and additionally considers the trade-off between the two costs. More specifically, it works 
toward reducing the storage cost at lower-tier nodes by merging more queries, and toward reducing 
the energy cost at higher-tier nodes by merging fewer queries (thereby reducing "false alarms"). 
We then present how to build a hierarchical sensor network that is optimal with respect to the 
weighted sum of the two costs. It allows for a cost-based systematic control of the trade-off based 
on the relative importance between the storage and energy in a given network environment and 
application. Experimental results show that the proposed method achieves a near-optimal control 
between the storage and energy and reduces the cost by 0.989 ^ 84.995 times compared with the 
cost achieved using the flat (i.e., non-hierarchical) setup as in the work by Xiang et al. 
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1 Introduction 



As the computing environment evolves toward ubiquitous computing, there has been increasing atten- 
tion and research on sensor networks. In the sensor networks environment, sensor nodes are connected 
through the network to the server (or base station) which coUects data sensed at the nodes p]. Exam- 
ple applications in such an environment include environment monitoring(e.g., temperature, humidity), 
manufacturing process tracking, traffic monitoring, and intrusion detection in a surveillance system. 

In particular, as wireless network becomes more common, there has been a lot of research on 
wireless sensor networks in which sensor nodes are connected in an ad-hoc network configuration in 
order to reduce the cost of deployment. In general, the objective in a wireless sensor network is to 
deploy cheap sensor nodes with limited resources (e.g., battery power, storage space) effectively and to 
collect data from those sensor nodes by using their limited resources efhciently [8]. 

There is an increasing trend lately toward large-scale wireless sensor networks [ 1 2 [ 1 1 3 j . as the scope 
of applications extends to municipality management, global environmental monitoring, etc. These 
networks typically aim at supporting a large number of sensor nodes deployed in a large area for use 
by a large number of users. For example, in the Network for Observation of Volcanic and Atmospheric 
Change (NOVAC) project[TT], a wireless sensor networks deployed in 15 volcanoes spread across five 
continents are connected in a multi-tier configuration to support a global volcano monitoring project. As 
another example, the EarthNet Online 3J collects earth observation information such as the worldwide 
weather and bird migrations through wireless sensor networks and makes the information available for 
thousands of individuals or organizations. This kind of scale upgrade will bring about a proportionate 
increase of the number of concurrent queries and the amount of sensor data. Thus, we expect an 
increasing importance of processing a large number of queries and a high volume data effectively in 
wireless sensor networks. In addition, we expect that building such large scale wireless sensor networks 
economically is important as well. 

With these regards, in this paper, we consider storage requirement needed to store queries in sensor 
nodes and energy consumption (i.e., battery capacity) needed to send the collected data from those nodes 
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to the server. There exists a trade-off between these two cost factors. Let us explain this trade-off with 
the centrahzed approach and the distributed approach[TS], which are the two naive approaches to build 
wireless sensor networks. In the centralized approach, the sensor nodes do not store any query and 
simply send all data to the server, which then processes all the queries on the data received. In this 
case, there is no storage cost to store queries in individual sensor nodes but the energy cost is very high. 
In the distributed approach, on the other hand, individual sensor nodes store all the queries and send 
only the results of processing the queries to the server, which then simply collects the received query 
results (This scheme is known as in-network query processing 21\) . In this case, the energy cost can be 
reduced but the storage cost is high. 

Neither of these two approaches is suitable for building large scale sensor networks. In the central- 
ized approach, since data are accumulated over the course of being relayed toward the server, sensor 
nodes near the server should send more data than the nodes farther from the server. As the number of 
nodes increases, this phenomenon will become more serious. In other words, sensor nodes closer to the 
server consume more energy than other nodes farther from the server - for sending not only the data 
generated by themselves but also the data received from other nodes; as a result, those nodes will be 
burnt out within a short time. Thus, the centralized approach is not appropriate for large scale sensor 
networks. On the other hand, the distributed approach becomes infeasible as the number of queries 
increases. A sensor node is not able to process a large number of queries due to the limitation on its 
memory and computing power. Consider as an example inexpensive Micamotes|13j. which typically 
have only 8'^128 Kbyte flash memory and 0.5^8 Kbyte RAM. Suppose a mote has 64Kbyte flash mem- 
ory and 10% of it is available for storing two-dimensional range queries. Additionally, suppose that 
each attribute value of a query is a real number of four bytes long and that the selection condition of 
a query is expressed as ci opi A op2 C2 {A: attribute name; ci and C2: attribute values; opi and op2- 
binary comparison operators). Then, the size of one query is at least 16 bytes[S]. and, thus at most 400 
queries can be stored in one mote. Obviously, these motes are far too short to store thousands of queries 
expected of large scale networks. Upgrading the sensor nodes to those with large enough memory will 
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raise the expense, which is not acceptable when there are so many sensor nodes to be deployed. 

Recently, in order to overcome these large scale problems, building wireless sensor networks in a 
hierarchical configuration is considered a practical alternative. A hierarchical wireless sensor network 
is organized in a multi-tier architecture 2 configured with sensor nodes having different amounts of 
resources and computation power. Nodes closer to the server have more resources and computation 
power than those farther from the server, and this makes it possible to carry out the processing that 
cannot be done with low-capacity nodes only. In hierarchical wireless sensor networks, nodes with 
smaller resources and computing power are recursively connected to nodes with more resources and 
computing power [Mj 1161 E]; thus, nodes near the server are capable of handling the larger amount 
of data accumulated from lower tiers. We think this configuration is suitable for resolving the query 
processing problem in large-scale networks mentioned above. Currently, however, the main stream of 
research on wireless sensor network query processing is for flat sensor networks (i.e., sensor networks that 
consist of nodes with the same capability). Accordingly, research on query processing for hierarchical 
sensor networks has been less than adequate. 

This paper proposes a method for building large scale hierarchical sensor networks to process 
queries effectively with respect to the trade-off between the energy cost and the storage cost. The queries 
considered in this paper are continuous range queries. Range queries are an important query type in 
many sensor network applications, particularly in monitoring applications JF, and there has been active 
research done to improve range query processing performance 6 . The method proposed in this paper is 
based on the technique of systematically controlling the trade-off between the energy cost and the storage 
cost through controlled merging of queries with similar ranges. There are existing methods proposed to 
reduce the energy cost by merging queries to avoid duplicate transmission of query results [TUl 1191 [20] . 
They, however, all focus on flat sensor networks and, therefore, cannot utilize the characteristics of 
hierarchical sensor networks in which nodes at different tiers have different capabilities. Besides, their 
work does not refiect anything about the trade-off because they do not consider the storage cost at all. 
In contrast, in this paper, we fully utilize the characteristics by employing a progressive approach, which 
merges increasingly more queries as the tier goes from the server toward the lowest tier and, in this 

4 



way, finds the optimal merging at each tier in consideration for the trade-off'. More specifically, at lower 
tier nodes, which are larger in number, the approach works toward reducing the storage requirement 
by reducing the number of queries through more aggressive merging; in contrast, at higher tier nodes, 
which are smaller in number, the approach works toward storing more queries through less aggressive 
merging and, in return, reducing the energy consumption by increasing the query accuracy by filtering 
out more unnecessary data. 

In this paper, we first propose the model and algorithms of the progressive query processing method. 
This method has two phases: query merging and query processing. The key idea in the query merging 
phase is to merge queries progressively as the tier goes from the highest (i.e., the server) to the lowest. 
In other words, it merges the input queries to recursively generate queries to be stored at the next tier 
nodes, first merging the input queries to generate queries for the second tier nodes, and then merging 
them to generate the ones for the third tier nodes, and so on. We say that the queries thus stored at 



multiple tiers form the inverted hierarchical query structurt 



as a whole. 



The Inverted hierarchical query structure is a new structure proposed in this paper. It is built from 
a multi-dimensional index storing the query ranges, by partitioning the index into multiple levels and 
then storing the root level of the index at the lowest-tier sensor nodes and the leaf level of the index 
in the server. This structure is based on the characteristics of hierarchical sensor networks that sensor 
nodes at a higher level store more detailed information while sensor nodes at a lower level store more 
abstract information. Thus, the structure is an inverse of a general tree-like index structure. 

In the query processing phase, the queries are processed progressively, that is, by refining the query 
result to be more accurate as data are sent from a lower tier to a higher tier. For this, the inverted 
hierarchical query structure is used to retrieve the query result at each tier. 

Next, we propose a method that builds an optimal hierarchical sensor network by systematically 
controlling the trade-off between the storage cost and the energy cost according to their weights. Since 
the relative importance between the two costs may vary depending on the application and environment, 
we formulate the cost of building the network as a weighted sum of the two costs and minimize the 



^ It is a forest structure to be more precise (see Figure[2j- 



total cost. As the optimization target parameter, we use the optimal merge rate - the average rate of 
merging queries at each tier. 

Finally, we show through experiments that the proposed method is useful for building a hierarchical 
sensor network in a cost effective manner. Specifically, first we show that there is little difference between 
the optimal merge rate obtained from an analytic model and the rate obtained from experiments; second, 
we show the superiority of the proposed method over the existing query processing method for flat sensor 
networks in terms of the total cost. 

The rest of this paper is organized as follows. Section 2 discusses related work. Section 3 describes 
the model and the algorithms of the proposed progressive processing method for hierarchical sensor 
networks. Section 4 proposes an analytical method for effectively building a hierarchical sensor network. 
Section 5 shows the superiority of the proposed method over the existing method through experiments. 
Section 6 concludes the paper. 

2 Related Work 

In this section, we review the existing research on the continuous range query processing in sensor 
networks and the state of the art in the hierarchical wireless sensor networks. 

2.1 Continuous range query processing in sensor networks 

In sensor networks, range query processing can be classified into single range query processing and 
multiple range query processing. Single range query processing executes only one range query in a 
system. Multiple range query processing concurrently executes many range queries in a system. 

Single continuous range query processing 

Li et al. [B] apply the data-centric storage to continuous single query processing. The query processing 
using the data-centric storage runs as follows. For storing data, each sensor node sends collected data 
to sensor nodes, where the target sensor nodes are determined by the value of the data element. For 
processing queries, the server sends a query to only those sensor nodes that have the result data of the 
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query. In the same work, Li et al. study an index structure using an order-preserving hash function for 
distributing data. That is, nodes that are physically adjacent have the adjacent value ranges of data 
stored in the nodes. As a result, the method reduces the query processing cost by reducing the average 
number of hops for sending queries and query results. Madden et al.[8| consider storing data in local 
sensors (unlike the data-centric approach) and propose building an R-tree-like index (called SRTree) 
based on the range of sensing values. Both of these works focus on single query processing. Hence, 
they are not applicable for recent query processing environments that register many queries and process 
them concurrently. 

Multiple continuous range query processing 

Ratnasamy et al. [TBj propose two basic query processing approaches for multiple query processing in 
wireless sensor networks. One approach processes queries at the server(called the centralized approach), 
and the other approach processes queries at the sensor node(called the distributed approach). In the 
former approach, all queries are stored in the server, and the sensor nodes send all sensed data to the 
server for query processing. This approach is effective only if the size of the region equivalent to the 
union of all query regions is close to the size of the entire domain space and, otherwise, incurs the 
overhead of sending unnecessary data to the server. This approach can reduce the memory requirement 
of the sensor nodes because it does not store any query in them, but has the disadvantage of incurring 
significant energy consumption because all data must be sent to the server. In the latter approach, each 
sensor node stores all queries disseminated from the server and sends to the server only the result of 
processing the sensor data. Thus, this approach may not have the problem of the former approach, 
but has the disadvantage that the sensor nodes may not be able to store all queries due to insufhcient 
memory if the number of queries is large. From these two basic query processing approaches, we can 
observe that there is a trade-off between the memory and the energy which are two important resources 
of sensor nodes. 

Furthermore, recently, there has been research to complement the centralized approach and the 
distributed approach. Specifically, the proposed methods are to share query processing in an overlapping 
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region in case there are overlapping query conditions. By identifying the overlapping regions among the 
user queries and rewriting the queries accordingly, the proposed methods eliminate duplicate processing 
and duplicate data transmission. These methods can be classified into the partitioning method and the 
merging method. 

In the partitioning method, the server partitions the individual query regions into overlapping 
regions and non-overlapping regions. Then, it sends the partitioned regions and the original queries to 
sensor nodes, which store them. Query processing is done for each partitioned region, and the query 
results are merged in the server or sensor nodes. Trigoni et al. [18] and Yu et al. [22] use this method to 
process range queries on the location information of sensor nodes. This method has the advantage that 
the result of merging the results of processing each partition is the same as the result of processing the 
original queries and, therefore, no "false alarm" will happen. It, however, has the disadvantage that, if 
there are a large number of overlapping query conditions, then the number of partitions to be stored in 
certain sensor nodes increases and, thus, the necessary storage increases as well. 

In the merging method, the server merges the regions of overlapping queries into one merged query 
region. The server then sends the merged queries to the sensor nodes that store them. Query processing 
results are then "reorganized" into those of the original queries in the server or sensor node. This 
method has the advantage that it can process a large number of queries at the same time by reducing 
the number of queries stored in a sensor node. It, however, has the disadvantage that a "false alarm" 
may happen as a result of merging queries. MuUer and Alonso[10 propose a method that compares the 
predicates of the range queries to extract those common to all queries and generates one query that has 
only the common predicates as the query condition. In this method, if there is no predicate common to 
all queries, then one query with no query condition is generated and, thus, has the problem of incurring 
a lot of false alarms in that case. Xiang et al. [191 120j propose a method which incrementally merges 
overlapping query regions and processes the resulting merged queries instead of the original queries. 
Here, the incremental merging is done until the cost of sending the false alarms occurring when queries 
are merged is no larger than the cost of sending duplicate results of overlapping query regions when 
queries are not merged. Xiang et al.'s query processing method has the meaning of a hybrid approach 
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(i.e., reducing the needed memory amount and the data transmission amount) taking advantage of both 
the centraUzed approach and the distributed approach, but targets "flat" sensor networks in which all 
sensor nodes in the network have the same capability and store the same set of merged queries. Thus, 
this method has the problem that it cannot utilize the characteristics of hierarchical sensor networks. 
Our method in this paper basically uses the same query merging method as Xiang et al.'s, but enhances 
it to control the rate of merging queries depending on the capabilities of individual nodes and to build 
a hierarchical sensor network. Our method has the advantage that it allows for a systematic control of 
the trade-off between the memory amount needed and the amount of data sent. 

2.2 Hierarchical wireless sensor networks 

As the scale of sensor networks increases, the hierarchical structure is used more in real applications 
than the flat structure in which all sensor nodes have the same capability [2] ■ 

Representative examples of such hierarchical wireless sensor networks are PASTA(Power Aware 
Sensing, Tracking and Analysis) [16] mentioned in C0SM0S[16] and S0HAN[4]. PASTA is used in 
military applications for enemy movement surveillance and is configured with the server and about 400 
intermediate tier nodes each clustering about 20 sensor nodes. SOHAN is used in traffic congestion 
monitoring applications to measure the traffic volume using roadside sensor nodes and is configured 
with the server and about 50 intermediate tier nodes each clustering about 200 sensor nodes. 

We expect that hierarchical sensor networks will be increasingly more utilized in the future as the 
scale and the requirement of applications increase. However, there has not been any research done on 
processing multiple queries talking advantage of the characteristics that sensor nodes at different tiers 
have different capabilities. Srivastava et al. [17j investigated how and on which node to process each 
operation during query processing in a hierarchical sensor network. This research, however, mainly 
deals with single query processing and, thus, is difficult to apply to multiple query processing. In this 
paper, we propose a method for processing multiple queries effectively by utilizing the characteristics of 
hierarchical sensor networks, i.e., the multi-tier structure made of sensor nodes with different resources 
and computing power. 
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3 Progressive processing in hierarchical wireless sensor net- 
works 

In this section, we present the progressive processing model and algorithms in hierarchical (i.e., multi- 
tier) wireless sensor networks. 

3.1 Overview 

In progressive processing, we systematically control the total processing cost by having the larger number 
of lower-capacity nodes (at lower tiers) partially process queries and the smaller number of higher- 
capacity nodes (at higher tiers) process the remainder. 

Example 1 (Progressive processing in hierarchical wireless sensor networks): Figure [ija) shows an 
example of a hierarchical sensor network organized in three tiers. The nodes at the third (i.e., lowest) 
tier are the largest in number but the smallest in capability and are connected to the more capable nodes 
at the second tier. All nodes except the server generate data (i.e., partial query results) periodically 
and send them to the server relayed via the nodes at higher tiers. The server then provides the final 
query result to the user. Figure [TJb) shows the set of queries stored in the nodes at each tier at the 
end of the query merging phase. In this figure, the rectangular regions represent range queries, and the 
boundary rectangle represents the domain space defined by the attributes specified in the queries. The 
server stores six original queries, the second tier nodes store three queries resulting from the merge of 
the six original queries, and the third tier stores two queries resulting from further merging them. In the 
query processing phase, sensor nodes at the lowest tier process the two queries on the sensed data and 
send to the second tier only the data satisfying the conditions (i.e., ranges) of the two queries. Then, 
the sensor nodes at the second tier process the three queries on the data sent from nodes at the lower 
tier and the data they generate on their own, and send to the server only the data satisfying the query 
conditions. Since nodes at a higher tier have queries of finer granularity, they can reduce "false alarms" 
and thereby reduce energy consumption. The server processes the original queries on the data sent from 
all nodes at lower tiers and provides the final result to the user. r-r 
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This figure shows a three-tier network as an example. 
Figure 1. Inverted hierarchical query structure in a hierarchical wireless sensor network. 

From Figure[IJb), we can see that the stored queries altogether form an inverted structure of a 
multi-dimensional index tree. In contrast to a multi-dimensional index tree structure in which all objects 
are stored in the leaf nodes and are merged to become more abstract at a higher level, in the proposed 
structure, the root (i.e., server) stores all objects (i.e., queries) and they are merged to become more 
abstract at a lower level. 

The progressive processing has the query merging phase which generates queries to be stored 
at each tier of the hierarchical sensor network to form an inverted hierarchical query structure and 
the query processing phase which processes sensed data and sends the result to the server using the 
inverted hierarchical query structure. Query merging is performed off-line in batch processing, and 
query processing is performed on-line every time data are generated. In query merging, queries are sent 
toward the lowest tier while merged "progressively" , and, in query processing, the sensor data are sent 
toward the server while being filtered "progressively" . 

In the query merging phase, minimum bounding rectangles (MBRs) are obtained from the queries 
and expressed as merged queries. In this case, it is important to decide how many MBRs the queries 
should be merged into because the number of MBRs affects the trade-off between the energy consumption 
and the storage usage. That is, if more queries are merged, then the storage space used by the sensor 
nodes to store queries is reduced, but the energy consumption is increased due to more frequent false 
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alarms. In this section, we present the model and algorithms under the assumption that the number 
of merged queries is known at each tier. Then, in section 4, we present a method for determining the 
optimal number of merged queries analytically using a cost model. 

In the query processing phase, all sensor nodes except the server process their own sensed data and 
the data received from the nodes at lower tiers, and send the results to the nodes at the next higher 
tier. Since more queries (of finer granularity) are stored at the higher tier nodes, the accuracy of query 
result is higher in them, thus generating the query result progressively. 

3.2 Network and data models 

In this section, we first define the hierarchical sensor network. Then, we explain data and queries used 
in this paper. 

The hierarchical sensor network 

We make the following assumption about the configuration of a hierarchical sensor network. All sensor 
nodes are connected to form a tree rooted at the server, and the nodes at the same depth make one tier. 
Data are generated by not only the nodes at the lowest tier but also those at intermediate tiers, and 
the sensed data are sent to the server though the nodes at higher tiers. All sensor nodes at the same 
tier have the same capability, that is, the same amount of memory and battery power. Nodes closer to 
the server have higher capability, that is, a larger amount of memory and battery power. In addition, 
all nodes at the same tier store the same set of queries. 

There have been various research on the hierarchical sensor network in the literature. However, the 
definitions of the hierarchical sensor network vary depending on specific environments. Nevertheless, 
it is a common understanding that a hierarchical sensor network consists of multiple tiers and deploys 
devices of different capabilities at different tiers[4l [iTl [2]. We define the hierarchical sensor network as 
in Definition 1. 
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Definition 1 (The hierarchical sensor network) The hierarchical sensor network is defined as a 
tree T = {V, E) of height h, where V is a set of vertices representing the sensor nodes and the server in 
the network (the root represents the server), and £^ is a set of edges representing the direct connection 
between a sensor node and its parent node. Let nodti denote the node at i*'' tier (1 < « < h). Let st and 
Ci denote the amount of storage and the amount of energy of nodei, respectively. Then, a hierarchical 
sensor network satisfies relationship: Sj > Sk and Cj > Ck < j < k < h). q 

Query and data 

In this paper, we focus on the range query as the query type in the hierarchical sensor network since it 
is an important query type in sensor networks applications [51 [51 IIOI [T^ . Consider a multi-dimensional 
domain space defined by the query attributes. Then, in the domain space, a query and a data element 
are represented as a hyper .rectangular region and a point, respectively 0. 

3.3 Progressive query merging 

3.3.1 The model 

Query merging in the first phase of progressive processing is done by finding the MBR enclosing the 
queries to be merged. Progressive query merging means that more queries are merged as the merging 
progresses to lower tiers. Thus, the size of a query region is larger at a lower tier while the number of 
queries is smaller. Let us refer to a query represented by an MBR that encloses certain queries at a 
higher tier node as a merged query, and denote the set of queries (or the query set) stored at the I'^'-tier 
node as Qi. Then, we can represent the set of merged queries at each tier as one level in the inverted 
hierarchical query structure, as shown in Figure[2l In this figure, an arrow represents the direction of 
query merging; queries at the tail of an arrow are merged to the query at the head of the arrow. For 
instance, the queries (71,1,(71,2 and (71,3 at the 1"* tier are merged to the query (72.1 at the 2'"^ tier. 

The query merging can also be seen as merging the partition of a disjoint set of queries. Figure[3] 
illustrates it with the same six queries as in Figure[21 The query (72,1 in Figure[21 for example, corresponds 
to the subset {(71,1, qi.2, qi.s} of Q2 in Figure[3l The partitioning is coarser at a lower tier. 
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Figure 2. 


An example of progressive query merging. 
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2 
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{{qi,i,qi,2.qi,3>'^qi,4'qi,5>.^cii,6>.^cii,7» 


3 


Q3: 





Figure 3. An example of progressive partition merging. 
3.3.2 The algorithm 

For each z*^ tier, the progressive query merging algorithm generates a merged query set Qi of a given 
size C'i. The objective of the algorithm is to minimize the query processing cost in consideration for the 
limited memory of sensor nodes. It is difficult to predict the cost of query processing for a given set of 
merged queries. The reason for this is that the cost depends not only on the network-specific factors 
like routing but also on unknown factors such as the query and data distributions. In this paper, we 
use the simplified model proposed by Xiang et al.|19|. in which the cost metric is the amount of data 
sent during the query processing, as the basis and extend it to fit into the hierarchical sensor network 
and take the memory usage into consideration. In Xiang et al.'s model, the size O of the overlapping 
region among queries and the size D of the dead region (i.e., the region added in extra to make the 
MBR enclosing the merge queries; it causes the false alarms) are calculated for each pair of two queries 
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that are candidates to be merged, and the pair that maximizes the difference between the sizes of the 
two regions, O ~ D, are merged. The effect of this is to merge queries with large overlapped regions, 
which is a reasonable strategy for reducing the data transmission cost. 

The proposed algorithm performs the query merging using a greedy approach based on the same 
strategy. Let 0{qi, qj) be the size of the overlapping region between two queries Qi and qj, and D{qi, qj) 
be the size of dead region between them. The algorithm chooses two queries qi and qj with the largest 
0{qi,qj) — D{qi,qj) from the set of queries that are either merged queries or the original queries and 
merge them first. This strategy is the same as the strategy used by Xiang et al.[19] except that they 
consider only the pairs that satisfy 0{qi, q/) — D{qi, qj) > 0. Specifically, in consideration of the storage 
cost for storing queries and the energy cost for sending query results, our approach determines the fixed 
number of queries that are to be stored into a sensor node at each tier. Then, we merge queries using 
a greedy method until we reach the number while Xinag et al.' approach determines the number of 
queries to be stored so as to only minimize the amount of data sent. 

Figure[4] shows the progressive query merging algorithm. Inputs to this algorithm are the set of 
the original queries Q, the height h of the hierarchical sensor network to be built, and the set of the 
numbers of merged queries K to be stored in every node at each tier. The output is the sets of merged 
queries that are stored in every node at each tier. At each tier i, the algorithm repeats merging two 
queries at a time until the number of merged queries falls lower than kt (lines 3-6). In order to find the 
pair of queries to be merged, it calculates the difference between the overlapping region and the dead 
region over every pair of queries and merges the pair with the maximum difference (lines 4-5). 

3.4 Progressive query processing 
3.4.1 The model 

In the query processing phase, for a given query, it is decided whether a data element falls inside 
the query region, that is, whether the attribute values representing the data element satisfy the range 
predicates representing the region. Progressive query processing is the process of propagating data 
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Algorithm Progressive Query Merging 

Input: (1) Q={q-|, Qg,'", q^}- the set of queries to be merged. 

(2) h: the height of a sensor network. 

(3) K={kj, k^, •", ky)'- the set of numbers of merged queries stored into a 

sensor node in each of the I'*' and the h^*^ tier. 
/* kj > k^ ^ •■■ ^ k[j , and ki=n, 

where n is the number of original queries. */ 
Output: {Q'2> ■■■> Q'h^' "^^^ s^^s merged queries stored into a sensor node at 
each of the 2"^ to the h* tier. 

Algorithm: 

begin 

1. for tier t = 2 to h begin 

2. Q; = Qt_i' /* Q'l = Q */ 

3. repeat 

4. find a pair of queries (q^ and q-) with the highest value of 0(qj,qj) - DCq^.qj) for 
i=0,l,-", I Qj' I and j=0,l,-", I Qj' I (j < i); remove them from Qj'; merge q^ with 

5. insert the merged query (that is, the result of merging qj with q^) into Q',.. 

6. untiKIQ'J <kt) 

7. end 

8. return {Q'2, •", Q'^} 
end 

Figure 4. The progressive query merging algorithm. 

elements bottom up in the inverted hierarchical query structure from the lowest tier nodes to the 
highest tier node (server), while filtering the data elements depending on the result of evaluating the 
range predicates of the queries at each tier. (Precisely speaking, multiple data elements are sent in a 
batch for the sake of efficiency.) Figure[5] shows an example of query processing. In this figure an arrow 
denotes an upward fiow of a data item (v) as it satisfies the range predicate of the query at the arrow 
tail. In this example, the query gi^i at the server retrieves the data element v. 

3.4.2 The algorithm 

Figureini shows the progressive query processing algorithm. The algorithm is run separatively at each 
tier of the hierarchical sensor network. The algorithm is designed to run for each query on each data 
element, which may not be the most efficient in terms of the query processing time. However, the query 
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Tier 



Query 
Set 



Qi = 



^1,5 



^2,2 



^2,3 



(5,9) 



^2,4 




Figure 5. An example of progressive query processing. 

processing time is independent of the energy cost and the storage cost which are the main cost items 
considered. Thus, it is not the focus of this paper. 



Algorithm Progressive Query Processing 

Input: (1) Q, ={qi, <i2''"' ^m-^" ^ s^'- merged queries stored into a sensor node, named s_node, 

at the t* tier. /* t > 2 */ 

(2) D(={dj, dg,"-, djj}: data generated by s_node. 

(3) R^_|_, = {r,, rg, r^}: data received from child sensor nodes at the (t+ l)* tier of s_node. 
Output: R(={r'j, r'jj}: query processing result to be transmitted from s_node to the parent sensor node 

at the (t-l)* tier (Rj £ (Dj U Rj+i)). 

Algorithm: 

begin 

1. D = D, U R,^i 

2. for each data element e; in D begin 

3. for each query q^ in Qj begin 

4. if the value ej belongs to the query region of q^ begin 

5. insert ej into Rj 

6. break 

7. end 

8. end 

9. end 

10. return Rj 
end 



Figure 6. The progressive query processing algorithm. 
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In the progressive query processing, a sensor node at the i*'* tier(i > 2) considers the data Dt 
generated by itself and the data Rt+i resulting from the query processing at the [t + 1)*'' tier as the 
target data for query processing(linc 1). The node compares the set of merged queries Qt with the 
target data and inserts only the data elements that satisfy the query condition into i?t (lines 2-9). In 
order to prevent the node from sending duplicate results of overlapping query regions among merged 
queries, the algorithm stops the comparison once it finds a query whose region contains the target data 
element(line 6jj. Then, the node sends Rt to its parent node at the {t — 1)*'' tier. This algorithm is 
run separately in every node at each tier to progressively filter the data to arrive at the highest tier 
(i.e., server). Finally, the server(i.e., the tier) performs post-processing to select the query results 
satisfying the condition of each query. 

In this section, we have proposed the algorithms under the assumption that the sensor nodes at 
each tier already know the number of the merged queries to be stored. In the next section, we propose 
an optimization method for determining the optimal number of merged queries. 



4 Determining the Optimal Number of Merged Queries 

In this section, we propose an analytic method for determining the optimal number of merged queries to 
be stored at each tier when designing the hierarchical sensor network. We first propose the cost model 
in Section 4.1 and then the cost optimization method in Section 4.2. 



4.1 The cost model 

In this paper, we use the weighted sum of the storage cost for storing queries and the energy cost for 
sending the query result as the total cost. We use the total amount of memory used in all nodes as the 
storage cost and the total amount of data sent during the query processing as the energy cost. We use 
byte as the unit of both the storage cost and the energy cost. 

■^When the algorithm is run at the server, Line 6 should be removed because the server must answer each query. 
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Eq.([T]) shows the cost model expressed as the function weightdsum. 



Weighted-Sum 



a ■ the total amount of data sent + the total amount of memory used, 



where a{> 0) is the scale factor provided by the user 



(1) 



In this equation, the value of a indicates the relative importance of the energy cost over the storage cost, 
and is set by the user based on one's preference. That is, in the environments where the energy cost is 
more important than the storage cost, the user gives a larger value of a, whereas in the environments 
where the storage cost is more important than the energy cost, the user gives a smaller value of a. In 
this paper, in order to control the trade-off between the two costs, we define the reference value of a, 
denoted as ao, which makes the importance of the two costs equal. This ao is the value for balancing 
between the two costs which use different scales, and is used as an example to determine the appropriate 
value of a for a given application. Eq.Q shows the definition of ckq" 



In this equation, the denominator represents the total amount of data sent from sensor nodes when 
every node stores only one query merged from all the original queries, and the numerator represents 
the total amount of memory used for storing queries into sensor nodes when every node stores all the 
original queries. That is, ao is the result of dividing the worst case memory usage amount by the worst 
case data transmission amount. 

In Eq-dl]), the total memory usage amount is determined by the number of queries stored in the 
nodes at each tier, and the total data transmission amount is determined by the amount of data sent 
at each tier based on the queries. We first introduce the notion of the merge rate in order to formulate 
the number of queries stored in sensor nodes at each tier. We use it as the optimization parameter for 
the Weighted_Sum. The merge rate is defined as the ratio of the memory usage amounts of two nodes 
at adjacent tiers, as shown in Eq.Q. 



the maximum possible total amount of memory used 



(2) 



ao = 



the maximum possible total amount of data sent 
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merge_rate 



the number of queries stored at a node at the i tier 



the number of queries stored at a node at the {i — 1)*'' tier 



for all 2 < i < h, where h is the height of the hierarchical sensor network, and 



the server is at the first (highest) tier storing all the original queries. 



(3) 



According to the definition above, the merge rate has the value in the range of to f. If the value 
is closer to 0, it means that more queries are merged. On the other hand, if the value is closer to 1, it 
means that fewer queries are merged. That is, the number of queries stored in a node at each tier is 
determined by the merge rate. For example, if the merge rate is 0, our approach is equivalent to the 
centralized approach and if 1, it is equivalent to the distributed approach. 

Next, we introduce the notion of cover to formulate the amount of data sent at each tier. The 
cover is defined as the ratio of the size of the domain space filled by all query regions over the size of 
the entire domain space. In order to obtain the exact amount of data transmission, we need additional 
information at each tier such as the selectivity of each merged query and the size of each dead region 
caused by query merging. This kind of information, however, is affected significantly by the application 
environment including the data and query distributions, making it difficult to obtain exact information 
at the time of designing the network. Thus, in this paper, we use an approximate model of the cover 
instead. Definition [2] shows the definition of the cover of a query set Q. 

Definition 2 (The cover of a query set Q) For a 

given query set Q = { qi, q2, ■ ■ ■ , Qn }, its cover cover(Q) is defined as: 



cover{Q) 



II $(9i)e---e<i>(9„) II 
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for qi,qj £ Q{1 <i < j <n), 
where D is the domain space, 
^{qi) is the region of the query qi, 
^{qi) ^{qj) represents the union of the two regions 
and and 

II ■ II denotes the size of the given region. (4) 

□ 

Assuming that queries are uniformly distributed in the domain space, cover{Q) can be approxi- 
mated at each tier as follows. Let n denote the number of merged queries, s denote the average selectivity 
of the set of the original queries, and c denote the cover of the set of the original queries, then mvefin) 
in Figure[7]is an approximation of cover (Q). 




1 — ^ 

1 c/s n 

Figure 7. The cover model. 

covef{ri) has the following properties: (1) If n = 1, covef{ri) equals 1; (2) As n increases, amer{n) 
decreases becoming c when n—^- That is, (x>ver{n) < covef{n — 1) < • ■ ■ <cover{l) — 1. 

These properties are from fact that the proposed merge method is based on MBR. Since the region 
of a merged query is represented by an MBR enclosing the regions of queries that are merged, the size of 
the region of the merged query is always greater than or equal to the size of the region resulting from the 
union of the query regions that are merged. Thus, as the query merging proceeds, the number of merged 
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queries n decreases, but the size of the region that is equivalent to the union of merged queries increases. 
In this paper we have assumed an environment in which we process a large number of queries with the 
uniform distribution, and thus, we assume that the cover of the merged query is 1. Even though this 
property does not guarantee the linearity of cover (n), in order to make the model simple, we assume 

that the cover linearly increases as n decreases, and then, estimate the theoretical number of queries for 

, . 1 ,1 . 1 J. 1 £11 J -J. 1, J. 1 • the cover of orieinal queries 

which the cover is completely nlled without overlap region as ti: 1 — — = — -, = — . 

^ f b ^|jg average selectivity or original queries 

4.2 Optimization 

In this subsection, we first formulate Weighted_Sum using the merge_rate and the cover model explained 
in Section 4.1, and then, analytically obtain the optimal merge rate - the merge rate that minimizes 
Weighted_Sum. Table 1 shows the notation used in this section. For ease of exposition, we assume that 
each sensor node generates only one data element per unit time. 



Table 1. The notation. 



Symbol 


Definition 


Nq 


The number of original queries 


c 


The cover of original queries 


s 


The average selectivity of original queries 


d 


The dimension of original queries 


h 


The height of a hierarchical sensor network 


f 


The fanout of a hierarchical sensor network 


SizCde 


The size of a data element 


m 


The merge rate 



The totaLtransmission(i.e., the total amount of data sent per unit time) is formulated as follows 
(refer to Appendix A for details): 



h i 

totalJransniission = ^^(^ize^e • /*~^ • ^^("^^ " 'rn^~^ ■ Nq + b)) 

i=2 3=2 

s • (1 — c) 

where a = , b = 1 + a (5) 

c — s 
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The totaLstorage(i.e., the total amount of memory used) is formulated as follows (refer to Appendix A 
for details): 



totaLstorage = ^( 2 • Sizcde ■ f^^ ■ Nq ■ m'"^) (6) 

From Eq.([ni) and Eq.®, Weighted_Sum is formulated as follows. 

Weighted-Sum = a ■ totalJtransimission + total_storage 

h i 

= a ■ Sizede ■ ^(r^' • [ Y.(-a ■ m^-^ ■ Nq + b)) 

i=2 j=2 

+ 2 ■ Nq ■ m'-i ]) 
s ■ (1 — c) 

where a = b = 1 + a (7) 

c — s 

In order to obtain the optimal merge rate, we take the derivative of the Weighted_Sum formula 
with respect to m and compute the roots from the derivative formula. Then, we substitute each root 
for m in the Weighted_Sum formula and find the root that minimizes the computed Weighted_Sum. We 
use Maple^, a mathematics software tool, for this computation. 



5 Performance evaluation 

5.1 Experimental data and environments 

We use two sets of experiments. In the first set, we show the accuracy of the proposed cost model 
as the parameters are varied. In the second set, we show the merit of our progressive approach over 
the iterative approach proposed by Xiang et al.jT^ in terms of the total cost (i.e., Weighted_Sum) of 
query processing as the parameters are varied. A common set of seven parameters are used in both 
sets of experiments: the scale factor a for controlling the "importance" between the amount of data 
transmission and the amount of memory usage, the cover of original queries c, the average selectivity 
of original queries s, the dimension of original queries d, the height of the sensor network h, the fanout 
of the sensor network /, and merge rate m. We use Weighted_Sum as both the accuracy and the 
performance measure. 
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We use the same data and query sets in both sets of experiments. We randomly generate synthetic 
queries and data with the uniform distribution. Here, "uniform" means that the locations of the queries 
(or the data elements) are set randomly in the query space (or the domain space). We generate queries 
with the same width in all domains(i.e., hypercubes) in two alternative ways: either by controlling 
the number of original queries or by controlling the cover of original queries. The latter is used only 
in the experiments for varying the cover of original queries, and the former is used in all the other 
experiments. The reason we do not control the number and the cover of the queries together is that 
there is a dependency between the two values. That is, given a set of random queries with a uniform 
distribution, if the number of queries increases (with the query selectivity fixed) then the cover also 
increases. This makes it impossible to generate a query set with a uniform distribution when both 
number and cover are controlled at the same time. 

In the first set of experiments, we experimentally evaluate the accuracy of our model for estimating 
the optimal merge rate that minimizes the weighted sum of the storage cost and the energy cost (i.e., 
Eq.ll])). We first analytically compute the estimated optimal merge rate as explained in Section4.2. 
Next, we experimentally find the actual optimal merge rate. Finally, we compare the two optimal merge 
rates. Table 2 summarizes the experiments and the parameters used. 

In the second set of experiments, we compare the performance merit of our progressive approach 
with the iterative approach proposed by Xiang et al.[Tn]. We measure Weighted_Sum while varying 
parameters explained above. Here, in our approach, we use the estimated optimal merge rate measuring 
Weighted_Sum while varying parameters explained above. Table 3 summarizes the experiments and the 
parameters used. 

All experiments have been conducted using a Linux-Redhat system with a 4 GHz processor and 1 
Gbytes of main memory. Since it is difficult to build an actual large-scale sensor network and change its 
configuration as we need, we conduct the experiments using a simulator program as commonly used in 
sensor networks-related database research [SI [51 [H]. We have implemented the simulator program using 
C. Table 4 summarizes the notation used in the next section to discuss the experimental results. 
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Table 2. Experiments and parameters used for showing the accuracy of the cost modeL 



Experiments 


Parameters 


Experiment 1 


accuracy 


h 


4 




as a is varied 


r 

J 


o 

8 






a 


^ 1 r\ — 2 « in — 1 « ^ 1 nl ^ i n2 

QfQ • iU ,ao • lU , ao, ao ■ lU , ao ■ iU 






s 


10 






a 


2 


Experiment 2 


accuracy 


h 


4 




as c IS varied 


f 


8 






a 


ao 






a 


2 






c 


0.01, 0.10, 0.99 


Experiment 3 


accuracy 


7 

a 


4 




as s is varied 


r 

J 


o 

8 






a 


ao 






s 


in — 5 in — 4 in — 3 
10 ,10 ,10 






a 


n 

2 


Experiment 4 


accuracy 


h 


3, 4, 5 




as h is varied 


r 

J 


o 

8 






a 


ao 






s 


1 A — 4 

10 






a 


2 


Experiment 5 


accuracy 


ll 


4 




as / is varied 


f 


2, 4, 8, 16 






a 


ao 






s 


10-4 






d 


2 


Experiment 6 


accuracy 


h 


4 




as d is varied 


f 


8 






a 


ao 






s 


10-4 






d 


1,2,3 



5.2 Experimental results 



5.2.1 Accuracy of the cost model 



Experiment 0: existence of the trade-off and the optimal merge rate 

FigurelUa) shows the trade-off between the total storage cost and the total transmission cost (i.e., en- 
ergy cost) as TO is varied. Here, we measure Weighted_Sum for 1046 randomly generated queries(i.e., 
iVQ=1046). Hereafter, we use Nq — 10A6 unless we explicitly specify the value. As explained in Sec- 
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Table 3. Experiments and parameters used for showing the performance merit of our approach. 



Experiments 


Parameters 


Experiment 7 


comparison of 


h 


4 




the performance 


r 

J 


o 

8 




as a. is varied 


a 


QfQ • iU ,ao • 10 , ao, ao ■ 10 , ao ■ iO 






s 


10 






a 


2 


Experiment 8 


comparison of 


h 


4 




the performance 


f 


8 




as c is varied 


a 


ao 






a 


2 






c 


0.01, 0.10, 0.99 


Experiment 9 


comparison of 


7 

a 


4 




the performance 


r 

J 


o 

8 




as s is varied 


a 


ao 






s 


in — 3 in — 4 in — 5 
10 ,10 ,10 






a 


n 

2 


Experiment 10 


comparison of 


h 


3, 4, 5 




the performance 


r 

J 


o 

8 




as h is varied 


a 


ao 






s 


1 A — 4 

10 






a 


2 


Experiment 11 


comparison of 


ll 


4 




the performance 


f 


2, 4, 8, 16 




as / is varied 


a 


ao 






s 


10-4 






d 


2 


Experiment 12 


comparison of 


h 


4 




the performance 


f 


8 




as d is varied 


a 


ao 






s 


10-4 






d 


1,2,3 



tion3.1, the transmission cost (i.e., ao-totaLtransmission) has a tendency to decrease as m increases. 
The storage cost has a tendency to increases as m does. Thus, a value of m that minimizes the weighted 
sum exists as shown in Figure[8l^a) . FigureE^b) shows the trend of the actual optimal merge rate as m 
is varied. We observe that the optimal merge rate has a tendency to increase as a does. 



Experiment 1: accuracy as a is varied 

Figuretni shows experimental results as a is varied. We have different optimal merge rates for different 
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Table 4. Notation for explaining experiments. 



Symbol 


Definition 


^opt^act 


The actual optimal merge rate measured 


^opt^est 


The estimated optimal merge rate obtained using the analytical model 


"^opt^act 


Weightcd_Sum measured using mopt.act 


opt^est 


Weighted_Sum measured using niopt.est 


ratiom 


The ratio of mopt.act to rUopt.est = 


ratiOw 


The ratio of Wopt.act to Wopt.est = 


gaiuw 


Weighted_tium measured using Xiang et al. s interative approach 


^opt^est 



-A-Qq * total_transmission total_storage -B-Weighted_Suni 



^a = Q(,*10-' 



a = a *10' 




0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 
Merge Rate m 

(a) existence of trade-off 
and the optimal merge rate 
(a=a„, ^=10^*, /!=4,./=8, rf=2, and Wg=1046) 



0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 
Merge Rate m 

(b) tendency of 
the optimal merge rate 



( j=10^**, /!=4,/=8, d=2, and A'g=1046) 

Figure 8. The existence of trade off and the optimal merge rate. 

scale factors as shown in this figure. From Figure[9l we see that ratiOm is 0.905 to 2.619. Other than the 
value of 2.619 when a is ao ■ 10^^, ratiOm is approximately 1.0 for all the other values of a. That is, the 
optimal merge_rate measured from the experimental data is almost the same as that obtained from the 
analysis. Besides, we see that the value of ratiow is 0.929 to 1.0. That is, the values of Weighted_Sum 
measured from the experimental data are very close to those obtained from the analysis. As we see from 
the result of this experiment, as a increases, the weight of the total transmission cost increases relative 
to the weight of the total storage cost and, thus, the optimal value is determined toward reducing the 
total transmission cost - toward making the optimal merge_rate close to 1. 
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10"* 



m,,^, „,= 0.062 
ratio,.. = 0.930 



„,= 0-985 
ratio.,. = 0.929 



no-2 Qo*10-' 



Qo*10' Qo*102 



Scale Factor a 



Figure 9. Optimal merge rate as a is varied(s=10 "^^ /i=4, /=8, d^2, and A^q = 1046). 
Experiment 2: accuracy as c is varied 

FigurefTUl shows the experimental results as the cover is varied. We use different query sets for different 
covers (we use Nq = 101 when c=0.01, 7Vq = 1046 when c=0.1, and A^q=52685 when c=0.99). From 
Figure[lOl we see that ratiom is 0.00092 to 1.008. Other than the value 0.00092 when the cover is 0.99, 
ratiow is approximately 1.0 for all the other values of the cover. Besides, we see that the value of ratiow 
is 0.995 to 1.0. That is, the Weighted_Sum measured from the experiments is similar to that obtained 
from the analysis. As the cover increases, the difference between the maximum and the minimum 
amounts of data transmission should decrease. Thus, reduction of total data transmission cost have 
no significant influence on the total cost if the cover increases. Hence, the optimal value is determined 
toward reducing the total storage cost - toward making the optimal merge rate close to 0. 



S 



■O 



"S 



10" 
lO'* 
10' 



10= 



0.566 




= 0.593 


'"<,p,„,= 0.562 
rario„.= 0.99g^ 


-0.000019 — 


"^Opt_L'S 

ratio^^. 


= 0.602 
= 0.995^ 




0.021 
rntin = 1 non 





0.01 



0.1 
Cover c 



0.99 



Figure 10. Optimal merge rate as c is varied(Q! = ao, s = 10 /i=4, /=8, and d—2). 
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Experiment 3: accuracy as s is varied 

FigurefTT] shows the experimental results as the selectivity is varied. From FiguredTJ we see that ratiom 
is 0.958 to 1.183 and ratiow is 0.983 to 1.0. The increase of the selectivity is closely related to the 
increase of the cover. That is, if the selectivity increases while the number of queries is fixed, then the 
cover of the original queries increases as well, and, thus, like the case of varying the cover, the optimal 
value moves toward reducing the total storage cost - toward making the optimal merge rate close to 0. 





10*10* 
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8*10'' 
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6*10'" 
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2*10'' 
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>^o,„ a.,= 0-579 
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rat!o„,= 0.983 
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0.355 
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0.997 
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10--* 

Selectivity s 



10-3 



Figure 11. Optimal merge rate as s is varied(Q; — ao, h=A. /=8, d^2, and iVQ=1046). 
Experiment 4: accuracy as h is varied 

FigurefTT shows the experimental results as the height is varied. We see that ratiOm is 0.993 to 1.088 
and ratiow is 0.993 to 1.0. When the height of the sensor network increases, the data transmission 
cost increases faster than the memory usage cost. This stems from the fact that the data sent are 
accumulated at each tier. Thus, the optimal value moves toward reducing the total data transmission 
cost - toward making the optimal merge rate close to 1. 

Experiment 5: accuracy as / is varied 

FigurefTSl shows the experimental results as the fanout is varied. We see that ratiOm is 0.993 to 1.088 
and ratiow is 0.994 to 1.0. 

Experiment 6: accuracy as d is varied 

FigurefMl shows the experimental results as the dimension is varied. We see that ratiOm is 0.993 to 
1.088 and ratioy, is 0.997 to 1.0. 

29 




Figure 12. Optimal merge rate as h is varied(a = ao, s = 10 ^, /=8, d=2, and Nq=104:6). 
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Figure 13. Optimal merge rate as / is varied(Q! = ao, s = 10 ^, /i=4, d=2, and Nq=10A6). 
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Figure 14. Optimal merge rate as d is varied(Q; = s = 10 /i=4, /=8, and A^q=1046). 
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5.2.2 Performance merit of our approach 
Experiment 7: performance as a is varied 

Figure fTSl shows the experimental resuh as a is varied. Here, we have different merge rates estimated for 
different scale factors (we use mopt_est=0. 00096 when a=ao • 10^^, r7iopt_est=0.062 when a=aQ ■ 10~^, 
m opt. est— ^■^^'i when a=ao, "iopt_est=0.985 when a=aa ■ 10^, and mopt_est=0.985 when a^ao • 10^))- 
From this figure, we can see that gain^ is 0.989 to 84.995. Except for the value 0.989 when a equals 
ao ■ 10, gairiw is 1.004 to 84.995, that is, Weighted_Sum in the progressive approach is smaller than 
Weighted_Sum in the iterative approach. The exception happens due to the fact that the cover model 
used in this paper (see Figure[7]) is an approximation of the cover in the real environment, and this 
introduces some error between the actual cost and the estimated cost. From these results, we see that 
our approach greatly improves the performance over the approach proposed by Xiang et al.|19| when 
memory usage is the prevailing cost(i.e., a is small), while giving a competitive performance when data 
transmission is the prevailing cost (i.e., a is large). 



^^Progressive approach s Iterative approach 

10' 




Scale Factor a 

Figure 15. The performance of progressive approach and iterative approach as a is varied(s — 10~^, 
h=A, /=8, d=2, and A^q = 1046). 

Experiment 8: performance as c is varied 

Figurc fTBl shows the experimental result as the cover is varied. Here, we have different merge rates 
estimated for different covers (we use mopt_est=0.602 when c=0.01, mopt_est =0.562 when c=0.1, and 
mopt_esf =0.021 when c=0.99). We have different query sets for different covers (we use A^q = 101 when 
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c=0.01, A^Q=1046 when c—0.1, and iVQ=52685 when c=0.99). We see that gairiyj ranges from 1.019 to 
2.498. This result shows that our approach outperforms Xiang et al.'s approach in the entire range of 
the cover. It also shows that, as the cover increases, the performance benefit of our approach over Xiang 
et al.'s approach decreases. The benefit of query merge with respect to the storage amount becomes 
maximum when the cover approaches 1.0. In this case, all the original queries are merged into one query 
in both our approach and the Xiang et al.'s approach; as a result, the total transmission amounts and 
the total storage amounts of the two approaches become similar and, therefore, the weighted sums of the 
two approaches become similar as well. Our proposed approach shows more performance benefit when 
the cover of the original queries is smaller. The case is more likely to happen in a real environment. 

-^Progressive approach -h- Iterative approach 
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Figure 16. The performance of progressive approach and iterative approach as c is varied(a — ao, 
s = 10^4^ /j^4^ j^g^ ^^2). 

Experiment 9: performance as s is varied 

FigurefTTl shows the experimental result as the average selectivity is varied. We have different merge 
rates estimated for different selectivities (we use mopt_e5j=0.604 when s=10^^, mopt_esf =0.563 when 
s=10~*, and mopt_est=0.355 when s^lO^'^). We see that gain^ ranges from 1.262 to 2.666. This result 
shows that our approach outperforms Xiang et al. 's approach in the entire range of selectivity. It also 
shows that, as the selectivity increases, the performance benefit of our approach decreases. As already 
mentioned in the experiment that compares the optimal merge rates obtained from the experimental 
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data with those obtained from the analysis, if the selectivity increases, then the cover increases as well 
causing the decrease of performance benefit as we see in Figure fTTl Thus, our proposed approach shows 
more performance benefit when the selectivity of the original queries is smaller. 



^Progressive approach -b- Iterative approach 
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Figure 17. The performance of progressive approach and iterative approach as s is varied(Q; = ao, h—A, 
/=8, d=2, and A^q=1046). 

Experiment 10: performance as h is varied 

FigurefTSl shows the experimental result as height of the hierarchical sensor network is varied. We have 
different merge rates estimated for different heights (we use mopt.est =0.437 when /i=3, mopt.est=0. 563 
when h—A, and mopt.est=0-G4:3 when h—5). We see that gain^ ranges from 1.973 to 2.220. This result 
shows that our approach outperforms Xiang et al.'s approach in the entire range of the height. It also 
shows that as the height increases, the performance benefit of our approach increases slightly. The 
reason for this increase is that the total storage amount in the iterative approach increases faster than 
in the progressive approach as the height increases. That is, in the iterative approach the same set of 
merged queries are stored in all sensor nodes regardless of the tier whereas, in our progressive approach, a 
smaller number of queries are stored as the tier goes lower. Thus, our approach shows more performance 
benefit when the height of the sensor network is larger. 

Experiment 11: performance as / is varied 

Figure[T9l shows the experimental result as the fanout of the sensor network is varied. We have different 
merge rates estimated for different fanouts (we use iriopt.est =0-531 when f—2, mopt_est=0.553 when /=4, 
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Figure 18. The performance of progressive approach and iterative approach as h is varied(Q! = ao, 
s = 10-4, /=8, d=2, and iVQ=1046). 

'7iopt_est=0.563 when /=8, and TOopt_est=0.568 when /=16). In the result, gain^ ranges from 2.103 to 
2.159. We observe that for all ranges of /, the performance of our approach is better than that of the 
iterative approach. 
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Figure 19. The performance of progressive approach and iterative approach as / is varied(a = ao, 
s = 10-4, /i=4, d=2, and iVQ = 1046). 
Experiment 12: performance as d is varied 

Figure !^ shows the experimental result as the dimension of a query is varied. We have different merge 
rates estimated for different dimensions (we use mopt_est=0.558 when d—\^ mopt_esf =0.563 when d—1^ 
and mopt_est— 0-572 when d=3). In the result, gain^ ranges from 1.842 to 2.157. We observe that for 
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all ranges of /, the performance of our approach is better than that of the iterative approach. 



^<-Progressive approach -b- Iterative approach 
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Figure 20. The performance of progressive approach and iterative approach as d is varied(a = ao, 
s = lO--*, h=4, f=S, and ArQ=1046). 

In summary, the experimental results show that our approach outperforms Xiang et al.'s approach 
by up to 84.995 times as a is varied except when a is equal to ao • 10. The results also show that our 
approach outperforms Xiang et al.'s approach by up to 2.666 times as the following other parameters 
are varied: the cover, average selectivity, dimension of original queries, and the height, fanout of the 
hierarchical sensor network. 

6 Conclusions 

In this paper, we have proposed progressive processing as a new approach to processing continuous range 
queries in hierarchical sensor networks. The contribution of this paper are summarized as follows. 

First, we have proposed a progressive processing model that considers the trade-off between energy 
and storage. This model takes advantage of the characteristics of the hierarchical sensor networks in 
which higher capability sensor nodes are deployed at a tier closer to the server. It also has the advantage 
of reducing the cost of building the network by reducing the storage cost at lower tier nodes, which are 
larger in number. We also have presented query merging and query processing algorithms for this model. 

Second, based on the proposed model, we have proposed a method for optimizing the total cost 
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(formulated as the weighted sum of the energy and storage costs) according to the given weight, and 
have proposed a method for systematically building a hierarchical sensor network that minimizes the 
total cost. 

Third, we have verified the merit of the proposed approach through extensive experiments. In the 
experiments for evaluating the accuracy of the proposed cost model, the results show that the ratio 
of the optimal cost measured over that obtained from the analytical cost model is 0.929 to 1.0. From 
these results we see that a hierarchical sensor network with near-optimal total cost can be built using 
the proposed model. In the experiments for evaluating the query processing performance, the results 
show that our approach outperforms the approach proposed by Xiang et al.[l9] by up to 84.995 times. 
Moreover, if the height of the sensor network increases, our approach shows a better performance than 
Xiang et al.'s approach. Thus, we can see that our approach is suitable for a large-scale sensor network. 

In conclusion, our approach provides a new framework for building a large-scale hierarchical sensor 
network that efficiently processes a large number of queries while considering the trade-off between the 
energy consumed and the storage required. 

For further work, we plan to improve the query processing model and algorithms to consider 
different data and query distributions as well as different query types such as aggregate queries. 
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Appendix-A Derivation of Formula for TotaLTransmission and 
TotaLStorage 

Derivation of totaLtransmission 

The total amount of data sent, denoted as totaLtransmission, is the sum of the amounts of data sent 
by all nodes at each tier while they are relayed to the server. Eq.® shows the formula for computing 
totaLtransmission. 

h i 

totaLtransmission = ^^^{Amt.datai ■ ^~^(cj)) 

1=2 3=2 

where Cj = the cover of merged queries stored at the j*'* tier, and 
Amijiatai = the amount of data generated by the sensor nodes at the z*'' tier (8) 

In Eq.®, Cj is formulated as follows using the definition of the cover model (see Figure[7]) and the 
merge_rate. 

Cj = cover {Nq ■ m^~^) (9) 

where Nq ■ m*~^ is the number of queries stored at the j*'' tier (note Nq is the number of queries 
stored in the server (at the 1^* tier) and m is the merge rate between two nodes in adjacent tiers (see 
Table [4?2| ) . In the same Eq.®, Amtjiatai is formulated as follows, based on the assumption that each 
sensor node generates only one data element per unit time. 



Amt.datai = (the number of sensor nodes at the i*'' tier) • (the size of a data element) 

= /'-I • SizCde (10) 
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By substituting Cj and Amt_datai in Eq.® with those from Eq.Q and Eg . (fTOj) . we can rewrite the 
formula for total-transmission as follows. 



h i 

total-transmission = "^^{Amt-datai ■ ^^(cj)) 

i=2 3=2 

h i 

h i 
h i 

= ^(5izede • ■ ^(-a • m^-i • Nq + b)) 

i=2 i=2 
S • (1 - c) 

where a = and b = 1 + a (11) 



Derivation of total_storage 

The total amount of memory used, denoted as totaLstorage, is the sum of the amounts of memory used 
by all nodes at all tiers. Eq. p2|) shows the formula for computing totaLstorage. 



total storage = {Amt_memi) 

1=2 

where Amtjmemi = the amount of memory needed to store 

the merged queries in all sensor nodes at the i*'' tier (12) 

The number of merged queries stored in a sensor node at the i*^ tier is formulated as Nq ■ m^~^ 
using Eq. (fT2|) and Eq.®. Thus, Amtjmemi is formulated as in Eq. lfTS]) . 



Amtjmemi = (the number of merged queries stored in all sensor nodes at the z*^ tier) 
(the amount of memory needed for storing one query) 
= f-i • Nq ■ m'-^ ■ 2-Sizede (13) 
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By substituting Amt-merrii from Eq. (|13p into Eq. (|12p . the formula for totaLstorage can be rewritten as 
foUows. 



total storage — {Amt_memi ) 

i=1 
h 

i=2 
h 

= ^( 2 • Sizede ■ ■ Nq ■ m'-i) (14) 



i=2 
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